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;  1.   INTRODUCTION 

In  studying  a  particular  sample,  the  order  in  which  the 
elements  of  the  sample  are  drawn  frequently  is  available.   One 
reasonable  mathematical  method  for  handling  this  information  is 
to  make  use  of  the  distribution  of  runs.   A  run  is  defined  as  a 
succession  of  similar  events  preceded  and  succeeded  by  different 
events  or  no  events.   The  number  of  elements  in  a  run  will  be 
referred  to  as  the  length  of  the  run.   The  total  number  of  runs, 
the  length  of  the  longest  run.  and  various  other  run  statistics 
can  be  used  as  the  sample  information  with  which  to  test  for 
randomness  of  arrangement  against  the  alternative  of  sequence 
dependency.   Also  the  above  run  statistics  can  be  used  to  test 
whether  two  sampled  populations  are  identical,  whether  trend 
exists  in  sampled  population,  and  so  forth. 

The  distribution  theory  of  runs  seems  to  have  been  started 
toward  the  end  of  the  nineteenth  century.   The  following  his- 
tory is  given  by  Mood  (19i).0).  Karl  Pearson  (l897)  regarded  the 
distribution  of  runs  as  a  special  case  of  the  multinomial  dis- 
tribution.  An  expression  for  the  mean  of  the  number  of  itera- 
tions of  a  given  length  from  a  binomial  population  was  derived 
by  Karl  Marbe  (l899).   Grunbaum  (I90i4.)  and  Bruns  (1906)  derived 
the  mean  of  the  nximber  of  runs  of  given  length  from  a  binomial 
populfition  by  the  multinomial  method.   In  a  book  published  in 
1907,  Von  Borkviewice  correctly  derived  for  the  first  time  the 
mean  and  variance  of  runs  from  a  binomial  population  using  a 
method  similar  to  that  of  Bruns.   Von  Mises  (1921)  showed  the 


number  of  long  runs  of  a  given  length  was  approximately  dis- 
tributed according  to  the  Poisaon  law  for  large  sample  sizes. 
It  was  not  until  1925  that  an  actual  distribution  function  ap- 
peared when  I  sing  (1925)  gave  the  number  of  ways  of  obtaining  a 
given  total  number  of  runs  (without  regard  to  length)  from 
arrangements  of  two  kinds  of  elements. 

Stevens  (1939)  published  the  seme  distribution  and  de- 
scribed an  X^  criterion  for  significance.   Wald  and  Wolfowitz 
(19l|0)  published  the  same  distribution  and  showed  that  it  was 
asymptotically  normal.   Wald  and  Wolfowitz 's  paper  described  a 
very  interesting  application  of  the  distribution  to  the  problem 
of  testing  the  hypothesis  that  two  samples  have  come  from  the 
same  continuous  distribution.   A.  M.  Mood  (19^0)  in  an  interest- 
ing paper,  investigated  different  problems  concerning  37uns.   He 
derived  distributions  of  runs  of  given  length  from  random  ar- 
rangements of  fired  numbers  of  elements  of  two  or  more  kinds, 
and  from  binomial  and  multinomial  populations.   Wolfowitz  (19l+i|a) 
derived  a  distribution  of  runs  up  and  down  and  also  the  asymp- 
totic distribution  of  runs  up  and  down.   Levene  end  Wolfowitz 
(19I44)  obtained  the  covariance  matrix  of  runs  up  and  down  and 
gave  several  methods  for  using  runs  up  and  down  in  tests  of 
significance.   Swed  and  Eisenhart  (I9i|3)  derived  the  tables  for 
testing  randomness  of  grouping  in  a  sequence  of  alternatives. 
The  asymptotic  properties  of  the  Wald-Wolfowitz  test  of  random- 
ness were  derived  by  Noether  (1950).   In  later  years  Kruskal 
(1952),  Mood  (195i|-),  Dixon  (1954),  Goodman  (1957),  Ferguson  and 
Kraft  (1955),  and  Weiss  (I960)  developed  runs  tests. 


In  this  report  the  distribution  theory  of  runs  is  developed 
in  section  2.   In  section  3,  runs  tests  are  given.   The  Wald- 
Wolfowltz  total  number  of  runs  test  is  discussed  explicitly  in 
section  3,   Runs  up  and  down  are  given  in  section  i|..   Total 
number  of  runs  up  and  down  and  chi- square  applied  to  run  fre- 
quencies are  also  discussed  in  the  same  section.   At  the  end 
in  section  5  applications  of  runs  tests  are  given. 

2.   DISTRIBUTION  THEORY  OP  RUNS 

2.1  Runs  of  Two  Kinds  of  Elements 

Consider  a  sample  space  constructed  using  n  elements  of 
two  kinds,  n-j^  a '  ^  and  nj  b'^  with  n-^  +  n2  =  n.   Any  particular 
sample  point  is  a  sequence  of  a'^  and  b'^  which  consists  of 
alternating  runs  of  a '  ^  and  b'^.   For  each  sample  point  let  r-^^ 
denote  the  number  of  runs  of  a'®  of  length  i  and  let  r2i  denote 
the  nximber  of  runs  of  b'^  of  length  i.   Thus  the  sequence 

aaabbaabeabbsb 
has  r^^]^  =  1,  r]_2  =  2,  ri3  =  1,  r2i  =  1,  r22  =  2,  and  all  other 
r .  , '  3  are  zero.   Let  r-i  =  ^  r-j^^^  and  r2  =  Z_  1*21  denote  the 
total  number  of  runs  of  a ' ^  and  b'^  respectively. 

Suppose  a  set  of  numbers  r;j^j(i  =  1,  2;  j  =  1,  2,  ...,  n^) 

such  that  Z_  j  r<^  =  n^^  is  given.   Then  the  numbers  of  ways  of 

J 
arranging  the  r]^  runs  of  a '  ^  and  the  r2  runs  of  b'^  are 


*1 


^1 


and 


V2 
r2j 


respectively.   Thus  the  total  number  of 
ways  of  obtaining  the  set  \^\^\    is 

F(r3_r2) 


N(ri^)  = 


1*1 


1-2 

1-2  j 


(2.1.1) 


where  P(r-,r2)  i^  the  number  of  ways  of  arranging  r-j^  objects  of 
one  kind  and  rp  objects  of  a  second  kind  so  that  no  two  ad- 
jacent objects  are  of  the  same  kind.   Then 

'O  ,  I  r^^  -  r2  I  >  1     ■ 

1  ,  I  r-,_  -  r2 1  =  1 

2  ,   r-L  =  r2 


PCr^rg)  = 


n 


-;^2 


(2.1.2) 


But  there  are  I    1   possible  arrangements  of  a ' ^  and  b'^.   If 

\"l/ 
each  of  these  arrangements  is  equally  likely,  then  the  probabil- 
ity of  obtaining  the  set  f I'l  j  I  is 

F(rj^,  r2)   ^ 


P(rij)  = 


^1 


^2/ 

L^2jJ 


(2.1.3) 


"1> 


The  probability  distributions  of  the  set  (r^J  and  /r-j^l  will  be 


derived  now.   By  summing 


^2 

^2j 


*1 


m 
m^ 


ml 


m-.  .'  m2  I .  .  .  m  ! 


over  all  partitions  of  n2,  the 


denotes  the  multinomial  coeffi- 


cient.  When  the  multinomial  coefficient  is  to  be  summed  over 
the  indices  m^,  the  following  conditions 


will  always  hold. 


Yl  m^  =  m,  m^  ;^  0 


'.'f2  f     \       i"(ro  -  1)  ...  (m  -  k  +  1) 

[  ^  I  = denotes  the  binomial 

coefficient. 


kl 


probability  distribution  of  the  set  rij   can  be  obtained.   The 
summation  over  all  partitions  of  n2  Is  aided  by  first  finding 
the  coefficient  of  r^^    in 

^^2 


(1  -  x)^2 

X 


<1 


r2   Y 
t=0 


r2  -  1  +  t 
r2  -  1   ^ 


The  term  corresponding  to  t  =  n2  -  1*2  gives  the  desired  result, 

so  we  may  write  the  coefficient  of  x^2  gg  (     ^) .      Thus  the 
"'  r2  -  1 

desired  summation  over  all  partitions  of  ng  is 


r„^  "1 

r2 


Then 


PCr^^j,  r2)  = 


^1 


12  -  1 
r2  -  li 


P(rir2) 


n 


n-i 


Summing  equation  (2.1,5)  over  r2,  and  simplifying, 


(2.1.1^) 


(2.1.5) 


P(rij.)    = 


^1 
r 


L^ijj 


P2  ''"   ^ 
^1 


n 


nn 


(2.1.6) 


Summing   (2.1.3)    over  r^^^   and   r2,    gives  by  means   of   (2.1.i|.) 

(II :  3  ill :  i)  nn.  ^.. 


P(ri,r2)    = 


(y 


(2.1.7) 


The  distribution  (2.1.?)  was  derived  by  Wald  and  Wolfowitz  in 

1939. 

The  probability  function  of  r^^,  the  total  number  of  runs 
of  a '3,  is  obtained  by  summing  (2.1.7)  over  r2. 

P(r^)  =  li ;  ^  (2.1.8) 


This  distribution  is  discussed  by  Stevens  (1939).   A  similar 
expression  holds  for  the  probability  function  of  r2. 

Mood  (I9i;0)  derived  several  distributions  useful  for  appli- 
cations.  He  added  together  long  runs  to  form  new  variables,  de- 
creasing the  number  of  variables  compared  with  (2.1.3)  and 
(2.1.6). 

One  of  the  marginal  distributions  is  obtained  by  summing 
(2.1.6)  over  r-|^^  for  i  ^  k.   Letting 


«lj  =  ^Ij  '    ^^ 

K 

■Si  =  ri 

sik  =T.      rij  . 

k 

A  = 

k-1 

The  multinomial  coefficient 

sik'- 

riirl  ...  rT„,  I 

must  be  summed  over  all  partitions  of  n,  -  A  such  that  every 
part  is  greater  than  k-1.   This  can  be  obtained  by  the  coef- 
ficient of  x'^^'   in 


,  k  ^  k+1  _^    ^sik  _  J^^V^  y    /sik  -  1  +  t\  1 

t=0  I   3tv  -  -1-   / 


which  gives 


'Ik 


ni  -  A  -  (k-l)si^  -  1 
^Ik-  •••  ^nT*    \       3lk  -  1 


»lk' 


(2.1.9) 


where   ^(v)  denotes  summation  over  all  positive  integers  r^^. 


"1 
^1  ,,..-,  rin.  such  that  IT  j  r-^j  =  n^^  -  A.   This  identity 

with  (2.1.6)  gives 

•   'si'[/n2+iyni-A-(k-l)sii^-l' 
,si3j  \  SI/  \    3ii<  -  1 


P(sii)  = 


,  i  =  1,2, ...k   (2.1.10) 


n 

nn 


Another  marginal  distribution  can  be  derived  by  consider- 
ing runs  of  both  kinds  of  elements.   Defining 
32 ^(j  =  1,  2,  ...  h)  and  B  in  terms  of  r2j  Just  as  a-^^   and  A 
were  defined  before,  it  follows  from  (2.1.3)  and  (2.1.10)  that 


31 
sii 


S2j 


fn-i  -  k  -    (k-l)s-L^  -  l) 
sik  -  1 


n2  -  B  -  (h-l)s2h  -  1   „, 
x\  I  F(3]^S2) 


P(3ii,32j)  = 


'2h 


-  1 


n 
^1 


j  =  1,  2,  ....  h 
Here  k  and  h  in  new  variables  s-.^^  and  S2j^  can  be  chosen  so  that 
the  number  of  variables  is  appropriate  for  dsta  in  hand. 
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2.2  Momenta  of  Runs  r-^,  V2   of  Two  Kinds 

of  Elements 


Instead  of  dealing  with  ordinary  momenta,  the  easiest  way 
to  find  moments  of  r^^  is  given  by  means  of  factorial  momenta 
becauae  the  erpressions  are  much  more  compact.   For  the  g^^ 
factorial  moment  M-'Tgl.  we  get  using  (2.1.8) 


n. 


which  can  be  written  as 


[s]      [n   ]  ri=g  \ri  -1/1^  ^1  "  S 


Hi 


But  it  follows  from  the  identity 


i=0  Vc  +  i/  I  i  /   \  C  +  B 


.  1)  wr ' ' 


that  (2.2.2)  can  be  written  as 

(n2  +  1) 

v^l  ■  8; 

ti'r,i= ^ ~  (2.2.4) 

from  which  the  mean  end  the  variance  of  r^  can  be  found  to  be: 


:n2+l)M(ni)&J 


n-,  (no  +  1)      „       (: 

ti(r.)  =  -i— ^ ,  Cr2(r.)  =  — ^ p^i (2.2,5) 

^        n  ^  n(n)Pj 

Similar  formulas  exist  for  the  mean  and  the  variance  of  rg. 


-  r\~W  #e. 


Applying  similar  methods  to  (2.1.7),  one  can  find  the 
general  factorial  moments  of  (r^^  -  1)  and  (r2  -  1), 


E 


nn 


2.3  Distribution  and  Moments  of  Runs 
of  k  Kinds  of  Elements 


(2.2.6) 


Let  a,,  ap,  •..,  a,^  be  k  kinds  of  elements.   Suppose  there 


are  n^  elements  of  the  i^^  kind.   Then  let 


k  ^i 

^  =  ^^   ^i'  ^i 


r,  =  TI  r^j 


j«l 


th 


where  r^^  denotes  the  number  of  runs  of  elements  of  the  i   kind 
of  length  j. 

Using  the  same  argximent  as  in  (2.1.3)  gives 

'^1 


1=1 


f'-ij) 


F(rT,  rp,  ...,  r^.) 


n 
n. 


(2.3.1) 


where  the  function  F(r,,  r^,  ...,  ^-iJ  >    which  will  be  referred 
to  hereafter  simply  as  P(rj^),  represents  the  number  of  different 
arrangements  of  r-^  objects  of  one  kind,  V2   objects  of  a  second 
kind,  and  so  forth,  such  that  no  two  adjacent  objects  are  of  the 
same  kind.  - 

The  eract  expression  for  F(ri)  can  be  found  using  generat- 
ing functions  and  is  given  in  Mood  (I9I4.O).  \  , 
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The  probability  PCr^)  is  obtained  by  summing  (2. 3.1)  over 
v^^   with  r^   fixed  by  means  of  a  generalization  of  identity 
(2.1.1|),  giving  .  .  '  '-      -■  ■ 


P(r,)  = 


k  /n.  -  1 
i=l  Vri  -  ly 


P(r.) 


n 


(2.3.2) 


Moments.   It  is  possible  to  find  moments  of  r^^  as  dis- 
tributed by  (2.3.2).   Since  ^  ^iv^)    =  1,  the  following  ia 

^i 


true: 


k  fnx    -   1 


r^e   i=l  Vr4 


F(ri)  = 


n 


L^l. 


(2.3.3) 


Prom  (2.3.3)  the  moments  are  derived  by  putting  Uj^  =  nj_  -  r*. 
The  factorial  moments  of  u^  ore  derived  below. 


Z    TTui^^Vi    ^(^i^    =  r  7r(ni-r,)  tiV/^^"']   P( 
ri  ^^ri-iy  ri  V^i-V 


r.) 


=  1 77<n.-u  &%(---). 


r,) 


'-TTi 


T-rri  l''_['^)  F(-i) 


n-  ^  B^ 


k  fail 

IT   (ni-i)LU 

i=l 


The  summation  involved  in  the  last  step  is  given  by  {2.;i.3)  . 
The  factorial  mor 
last  equation  by 


The  factorial  momenta  of  the  u^  can  be  obtained  by  dividing  the 

n 


ni 


11 


n  -  '^Si 
E(T7  u^L-^J)  = 


i=l 


n 


(2.3.4) 


Prom  (2.3.i|.)  the  moments  of  the  r^^  may  be  found;  the  means, 
variances,  and  covariances  are 

nj^(n  -   njL  +  1) 


n 


_n,^-^    nj 

^ij  — :rpT — 


nn 


(Tii  = 


ni&J    (n  -  ni  +  1)  W 


(2.3.5) 


(2.3.6) 


(2.3.7) 


nni 


The  moments  of  the  variobles  r^i    In  the  distribution  (2.3-1) 
are  obtained  by  means  of  identities  similar  to  (2.3.3) • 

2.4  Asymptotic  Distribution 


The  distributions  obtained  in  the  previous  section  are 
asymptotically  normal  when  the  n.  become  large  in  such  a  way 
that  the  ratio  n^/n,  denoted  by  e^,  remains  fixed. 

The  limit  theorems  for  distributions  such  as  (2.1.3)  and 
(2.1.6)  cannot  be  derived  because  the  number  of  Independent  var- 
iables Increases  with  n.^  The  asymptotic  character  of  the  dis- 
tribution (2.1.10)  will  be  given  in  the  following  theorems. 
Other  theorems  of  similar  nature  can  be  proved  using  the  same 
procedure. 

Theorem.   The  variables 


12 


i   2 
X.  =  _i± ± i  <  k  (2.1^.1) 


^k  = 


»lk  -  "«1^  «i 


are  asymptotically  normally  distributed  with  zero  means  and 
variances  and  covariances 

^ij  =  63^^'*"^'^  e2^[(  1+1)  (j+1)  6162  -  ije2  -  2eij  i,J  <  k,  i  f^  j 

Jii  =  ei^^"^  62^  O^i"^^)^  ®1®2  -  ^^^2  -  2eiJ  +  61^62^.  i  <  k 

^k  =  ei^"^^"^  62^  [  (i+l)keie2  -  ike2  -  e^  1   i  <  k 

CTy^^   =  e^^^"-'-  62  k^(ei-l)e2  -  e^  +  e^  62 

Proof.   Let  us  make  the  substitutions 
^i  ~  ^i"^  1  =  1,2 

'^li  ~  ^6^^622  +  VH  x^  i  =  1,  2,  .  .  . ,  k  -  1 

^Ik  ~   nei^e2  +    fn  x^ 

s^  «=  neie2  +  /  n  ZT  x. 

,      ,        _  k-1 
A  =  n(ei  -  e^^  -  kei^e2)  +  /n   ^   ix^ 

in  equation  (2.1.10)  and  estimate  the  factorial  by  means  of 
Stirling's  formula 

mJ.  =  /i^  mni+(l/2)  ^-m  (^  ^  o(l/m))  (2.1^.3) 

First  note  that  the  exponential  factors  cancel  out  because  the 
sum  of  lover  indices  of  a  binomial  or  multinomial  coefficient 
is  equal  to  the  upper  inder.   Also  simplifying  the  expression 
by  considering  in  detail  only  terms  which  involve  the  r^,    the 


13 

normalizing  constant  can  be  determined  from  the  final  limit 
function.   Any  function  of  the  parameters  will  be  represented 
by  the  letter  k.   Thus  in  (2. 14.. 3)  we  need  consider  only  the 
factor  m"^"^^^/^^.   All  factorials  will  be  of  the  form 

m'.  =  (na  +  fn  L(x)  +  b)  J 
where  L(x)  is  a  linear  function  of  the  r^,    and  a  and  b  are 
independent  of  n  and  Xj_.   Now 

^in+(l/2)  ^  (^^^^(^j^^)na+f^L(x)+b+(l/2) 

^  ^^gjna+j^L(A)+b+(l/2)(^^  ^^  ^   ^jna+|^L(x)+b+(l/2) 

a^    na 

=  k(na)^^^^  (^^.^i^  ^^)na+tnL(x)+b+(l/2) 

a  -fn   na 

and 

log  m"^"^^^'^^^  =  k+fHL(x)log  na+(na+i^L(y)+b+(l/2) 

L(x)    b 

•  logd  + +  — ) 

a  Yn       an 

=  k+fnL(x)log  na+(na+fnL(x)+b+(l/2) 

l2(x) 


— 1-  +  0(l/n^/2^] 
a'^n  / 


1   -         1 

«=  k+ynL(x)(l+log  na)  +  —  L^Cr)  +  0( )  {2.l^.k) 

2a  fn 

so  terms  arising  from  b  (and  b  +  1/2  in  the  exponent)  will  be 

neglected  as  they  give  rise  only  to  terms  Independent  of  the  x^^ 

or  of  order  l/n"^/^.   Of  course,  log(l  +  0(l/m))  =  0(l/m).   Thus 

keeping  significant  terms  only,  the  result  of  the  substitution 

(2.ij..3)  snd  (2.i|..Ij.)  in  (2.1.10)  after  taking  logarithms  and  using 

i2.k.k)    is 


Ik 


k-1  k-1     Xj_ 

•log  P(sii)    =  k+jH   Hxj^dog  ne. ^62^^+1)    +    IT    -. — 

1  1     2e2^-'-e2 

iJL  o  1  k  2 

-f^(r    ^i)(log  ne22+l)    + iT    r^) 

1  262^        1 

+  fH(II    iXi+(k-l)xj^)(log  ne^^^+l) 
1  JL.  2 


(ZT    i   ^i  +    (k-Dxjj) 


2e^k        1  ^  (2.U.fJ) 

^  2 

k  k 

+  2fn  xj^dog  ne]^   eg   +   1)    + 


®1   ®2 


-  VK(r    i   Xi)(log  ne^^k+l  +   1) 
+  — ^   (i"    i   Xi)2  +  0(l/f^) 


k+1 
^°1 

The  coefficients  of  xj^(i<k)  and  x^^  are 

-^{log  ne]_^e2^+l  -  log  ne2^-l+i  log  ne^^^+i-l  log  ne]^k"*'^-i) 

=  0 
Vn(-log  neg^-l+k  log  ne-j^^+k-  log  ne^^^-l  +  2  log  ne^ke2+2 
-  k  log  nei^'^^-k)  =  0 
Hence  only  the  quadratic  terms  remain  and  (2.I|.5)  may  be 
written  as 

-log  P  «=  k  +  1/2  ^  cr'^   X4X.  +  Oil/fn) 
where 

^  =  -T  ""  "TIT       1,  J  <k,  i  7^  J 
®2    e^^  -'■ 
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(T 


ii  _ 


i'^e. 


®2^    ®l'^®2^ 


e- 


k+1 


-ik 


1    i  +  i(k-l)e2 
—  +  — 


er 


k+1 


i  <  k 

i  <  k 

2 


<:7^^  =  A 


(k-l) 


e. 


«1^«2 


k+1 


It  is  merely  a  matter  of  straightforward  multiplication  of  the 


two  matrices  to  verify  that 


-ij 


is  the  inverse  of 


cr. 


iJ 


hence  is  s  positive  definite  matrix.   Then 

P  =  ke-^/2^^  "^^i^J  (1  +  Od/r^)) 
In  this  equation  k  must  necessarily  contain  the  factor  [l/Vn) 
because  there  are  k  +  5  factorials  in  the  denominator  and  5  in 
the  numerator  of  (2.1.10).   Since  A r^  =  1,  this  factor,  in 
view  of  (2.1(..l),  may  be  replaced  by  iT/^r.^,    so 

P  =  ke-^/^^'^^^'^i'J-rrA-id  +  0{1/1^) 
By  restricting  the  r^   to  any  finite  region  R  in  the  r- space, 

the  function  0(l/n  '  )  approaches  zero  uniformly  as  n >  oo. 

Thus  if  k^  <  B^  are  any  positive  numbers  such  that  the  corre- 
sponding values  of  x^^,  say  aj^  and  bj^,  obtained  by  substituting 
k^   and  B^   for  r^^  in  (2.ij..l),  determine  a  rectangular  region 
R'  (aj[  <:  x^  <  b^) ,  which  lies  in  R,  then 

f^        P(ri)  =  ^        ke-l/2^^^^^i"J7r^Xi(l  +  Od/^K)) 
ri=Ai 


X4=a 


i-«i 


n— > 
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by  the  definition  of  s  definite  integral  and  Riemsnn'a  funda- 
mental theorem. 

Cor.  1.   The  variable 

r  -  ne'ie2 

where  r  is  the  total  number  of  runs  of  one  kind  of  element,  is 
asymptotically  normally  distributed  with  zero  mean  and  unit 
variance. 

Cor.  2.   The  variable  Q  =  1I(r^^   Xj|^x.  is  asymptotically 
distributed  according  to  the  r  -law  with  k  degrees  of  freedom. 


2.5  Longest  Run 


The  probability  that  the  longest  run  of  a • s  will  be  of 
length  s  can  be  obtained  by  taking  n,  and  np  as  fixed  and 
summing  the  formula 


P(r,j)  = 


1 

L^is. 


'n2  +  1 

^1 


(2.5.1) 


n 

Hi 


over  all  values  of  r^^  and  over  all  sets  of  r,-,,  ^lo- 
^l(s-l)'  ^Is  which  satisfy 


I 


Jr..  =  ni 


,  L    r.,.  =  ri. 


and  rig  ^  1 


and  such  that  r^  exceeds  neither  n^  -  s  +  1  nor  n2  +  1.   The 
probability  that  the  longest  inan  of  either  a's  or  b's  will  be 
of  length  3  can  be  obtained  by  an  analogous  attack  upon  the 
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formula 


rri 


V2 


PCrn,  rp) 


P(rij)  = 


5ls„  L^2s_ 


(2.5.2) 


with  the  proviso  that  both  r-,g  and  r2g  cannot  be  zero  at  the 
same  time. 

3.   STATISTICAL  TESTS 

The  various  formulas  given  above  could  be  used  as  the  bases 
for  a  variety  of  statistical  tests  of  the  hypothesis  that  a's 
and  b's  are  arranged  randomly.   The  particular  formula  used 
would  depend  upon  the  conditions  given  and  upon  the  alternative 
hypothesis  against  which  one  wished  the  test  to  be  most  sensi- 
tive.  However,  calculations  of  probabilities  generally  become 
quite  involved  at  any  but  the  smallest  sample  size. 


3.1  The  Wald-Wolfowitz  Total  Number 
of  Runs  Tests 


Wald  and  Wolfowitz  (l9i|-0)  developed  the  total  number  of 
runs  test  by  using  the  distribution  function  for  the  total 
number  of  runs.   Suppose  the  two  samples  have  been  drawn  at 
random  and  independently  of  each  other,  each  from  a  continuously 
distributed  population.   We  wish  to  test  whether  or  not  the  par- 
ent populations  are  identical.   Let  U  stand  for  the  total  number 
of  runs  of  both  a's  and  b's.   The  number  of  runs  of  a ' s  can  be 
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one  less  than,  equal  to,  or  one  greater  than  the  number  of  runs 
of  b's,  U  can  be  an  odd  number  in  two  ways  but  can  be  even  in 
only  one  way. 

The  probability  that  the  total  number  of  runs  will  be 
some  even  number  2r,  using  (2.1.7),  is 

n  -  l\  /  m  -  1^ 


(3.1.1) 
m  +  n 

n 

where  m  and  n  are  sizes  of  observations  designated  as  a 'a  and 

b's.   The  probability  that  it  will  be  some  odd  number,  2r  -»-  1, 

is 


Pr(U  =  2r  +  1)  = (3.1.2) 

m  +  n 

ra 

Lehmann  (1951),  Dixon  (1954),  and  Epstein  (1955)  discussed 

the  efficiency  of  the  above  test.   The  Wald-Wolfowitz  form  of 

the  run  test  has,  relative  to  student's  t-test,  an  asymptotic 

relative  efficiency  of  zero  and  a  small  sample  efficiency,  which, 

when  each  sample  contains  five  or  less  observations,  generally 

exceeds  .96  and  may  be  as  high  as  .995.   The  test  compares  poorly 

with  other  distribution-free  tests.   Epstein  (1955)  and  Lehmann 

(1951)  investigated  the  power  of  this  test,  the  former  author 

sampling  from  normal  populations  with  homogeneous  variances,  the 

latter  sampling  from  any  continuously  distributed  population. 

They  found  the  Wald-Wolfowitz  runs  test  to  be  inferior  in  power 
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to  the  following  tests:   Student's  t,  Lehraann's  (1951)  most 
powerful  test,  Mann-Whitney  test,  median  test,  and  Epstein's 
(1955)  exceedance  test.   If  the  ratio  ra/n  of  sample  sizes  re- 
mains constant  as  sample  sizes  m  and  n  approach  infinity,  the 
Wald-Wolfowitz  test  is  consistent.   If  the  ratio  m/n  does  not 
remain  constant  but  approaches  zero  or  infinity,  the  teat  is 
inconsistent. 

Probabilities  for  U  have  been  tabulated  by  Swed  and  Eisen- 
hart  (1914-3)  for  m  ^  n  ^  20  and  for  certain  other  cases.   David 
(1914.7)  has  provided  tables  appropriate  when  m  +  n  «  li^  and 
2  ^  U  ^  14. 


3.2  Length  of  Longest  Run  As  a  Test  for 
Randomness  Against  Trend  Alternatives 


This  test  has  been  proposed  by  Hosteller  (I9I4.I).   Suppose 
that  a  series  of  observations  has  been  taken  upon  a  continuously 
distributed  variable  and  that  the  observations  have  been  ar- 
ranged in  the  order  in  which  they  v^ere  drawn,  no  two  observa- 
tions having  been  drawn  simultaneously.   If  each  observation  is 
now  labeled  A  or  B,  depending  upon  whether  it  is  above  or  below 
the  median  for  the  entire  series,  the  presence  or  absence  of 
trend  can  be  tested  by  using  as  the  test  statistic  one  of  the 
following:   The  length  of  the  longest  run  of  A's  (or  B's),  or 
the  length  of  the  longest  run  considering  both  A's  and  B's.   If 
there  is  an  odd  n\imber  of  observations,  one  of  them  will  be  the 
median  and  it  should  be  discarded.   Hosteller  has  published 
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appropriate  tables  for  the  cases  where  n^^  =  n2  =  5,  10,  l5.  20, 
or  25.   The  use  of  only  the  length  of  the  longest  run  ignores 
the  "inforrastion"  contained  in  the  lengths  of  the  less  than  long- 
est runs.   Bateman  (19^-8)  investigated  the  case  where  this 
statistic  was  found  to  be  less  powerful  than  the  total  number 
of  runs. 

3.3  The  Sum  of  Squared  Run  Lengths 

The  total  number  of  runs  does  not  directly  take  account  of 
the  lengths  of  runs  which  are  more  explicit  indices  of  tendency 
of  like  objects  to  cluster.   Ramchandran  and  Ranganathan  (1953) 
proposed  a  test  which  overcomes  this  objection.   They  found  a 
new  statistic,  N,  which  is  the  sum  of  the  squares  of  the 
lengths  of  runs,  i.e., 

N  =  ZT  j^r^i  +  "^  J^r2. 

Thus  all  runs  are  taken  account  of,  but  each  run  is  permitted 
to  influence  the  test  statistics  in  proportion  to  the  square  of 
its  length.   Ramchandran  and  Ranganathan  recommend  the  test  for 
the  same  situation  dealt  with  by  Wald  and  Wolfowitz,  the  test 
being  used  to  decide  if  the  two  samples  came  from  identical 
continuous  populations.   The  authors,  considering  only  the  case 
where  n^  =  n2,  have  tabulated  the  values  of  N  required  for  var- 
ious levels  of  significance.   The  table  values  of  N  are  exact 
for  the  cases  3  ^  n]^  :<r  5  «nd  eppro^-imate  for  6  $:  n-j^  <:  15,  in  the 
later  case  having  been  obtained  by  reading  points  from  a  Pear- 
son type  VI  curve  fitted  to  the  true  distribution  of  N. 
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3.1}.  Dixon  Test 

Dixon  (I9i;0)  presented  a  criterion  for  testing  the  hypoth- 
esis that  two  samples  have  been  drawn  from  populations  with  the 
same  distribution  function,  assuming  only  that  the  cumulative 
distribution  function  common  to  the  two  population  is  continu- 
ous.  Let  the  two  samples  0^^  and  0^^  be  of  size  ra  and  n,  respec- 
tively.  Assume  that  n  :S^  ra  without  loss  of  generality.   Suppose 
the  elements  u^^,  U2,  ...,  vl-^   °^  "^n  ^■'^®  arranged  in  order  from 
the  smallest  to  the  largest,  that  is,  Vii  <:   U2  <:...,  Uj^. 

This  sequence  can  be  represented  by  points  along  a  line. 
The  elements  of  Op,  represented  as  points  on  the  same  line  are 
divided  into  (n  +  1)  groups  by  the  first  sample,  0^.  Let  m^^  be 
the  number  of  points  having  a  value  less  then  u-j^,  TOj^  the  number 
lying  between  u^  and  u^^^^d  =  1,  2,  . . . ,  n)  and  m^+i  ^^®  number 
greater  than  u^^,  (ifin+l  ~  ^^  -  ^i  "  ^2  ~  •  • '  >  ^v)  '  -^^^  criterion 
Dixon  proposed  is 


2    1^1  f      1  m^ 


i=l  \n  +  1   m 

The  mean  and  variance  of  c^  can  be  found  to  be  as  follows. 

n(n  +  m  +  1) 


E(c^)  = 


m(n  +  1) (n  +  2) 


2   i).n(m  -  1)  (m  +  n  +  1)  (m  +  n  +  2) 
c^       m3(n  +  2)2(n  +  3)(n  +  i|.) 
Significance  Values  of  c£.   Let  c^  be  defined  as  the  small- 
est value  of  0  for  which 

P(c2^  e)  <  ^ 
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Then  the  values  of  c^  can  be  computed  for  small  values  of  m 
and  n.   The  probability  P(c^  ^  c^  )  will  in  general  be  less  than 
^  because  the  distribution  of  c^  is  not  continuous. 

For  large  values  of  m  and  n,  Dixon  (19I4.O)  fitted  e   gamma 

distribution  to  the  distribution  of  nc^  by  the  method  of  moments. 

?     2     2 
By  using  the  transformation  X  =  nc  ,  nkc  is  considered  as 

distributed  as  chi-square  with  "0  degrees  of  freedom,  where  V  is 

not  necessarily  an  integer.   Chi-square  tables  can  be  used  for 

approximate  values  of  the  probability  that  nkc^  will  exceed 

certain  values.  ■ 

U.   RUNS  UP  AND  DOWN 

ij..l  Introduction 

Let  s  =  (h-j_,  ...,  hjj)  be  a  random  permutation  of  the  n 
unequal  numbers  a^^,  .  .  . ,  a^,  and  let  R  be  the  sequence  of  signs 
(+  or  -)  of  the  differences  h^^-,^  -  hj^(i  =  1,  . . . ,  n  -  1)  .   It 
is  assumed  that  each  of  the  nl  sequences  s  is  equally  probable. 
A  sequence  of  p  consecutive  plus  signs  not  immediately  preceded 
or  followed  by  a  plus  sign  is  called  a  run  up  of  length  p;  a  se- 
quence of  p  consecutive  minus  signs  not  immediately  preceded  or 
followed  by  a  minus  sign  is  called  a  run  down  of  length  p.   The 
terra  "run"  will  denote  both  runs  up  and  runs  down.   As  an   ex- 
ample, if 

s  =  (i^  6  2  3  5) 
then  in  R  =  (+  -  +  +)  there  are  three  runs,  one  up  of  length 
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one,  one  down  of  length  one,  and  one  up  of  length  two.   Let  rp 
and  r'p  be  the  number  of  runs  up  and  down  in  R  of  lengths  p  and 
p  or  more  respectively.   Levene  and  Wolfowitz  ( I9i|ij.)  found  the 
exact  values  of  cr(rprq),  cr   (^p),  cr(r'p  r'^),  cr  (r'p),  and 
cr(r_r'  ).   The  values  of  E(rp)  and  E(r'  )  were  also  found. 
Certain  misconceptions  about  applications  of  runs  were  also 
discussed  by  Levene  and  Wolfowitz. 

J.  Wolfowitz  (I9i|ij.a)  established  several  theorems  about  the 
limiting  distribution  of  a  class  of  functions  of  runs  up  and 
down.   These  results  apply  to  a  large  class  of  "runs".   A  new 
recursion  formula  was  found  by  Olmstead  (l9Ij.6)  to  give  the  exact 
distribution  of  arrangements  of  n  numbers,  no  two  alike,  with 
runs  up  and  down  of  length  p  or  more.   These  were  tabled  for  n 
and  p  through  n  =  \\\.      An  evact  solution  is  given  for  p  ^  n/2. 
Olmstead  (l9i4-6)  also  presented  in  simplified  form  the  mean  and 
variance  determined  by  Levene  and  Wolfowitz  ( I9i4l|.)  .   Wolfowitz 
(I9ijl|.)  iiaa  shown  that  the  limiting  distribution  for  runs  up  and 
down  is  a  Poisson  distribution.   Olmstead  (191;6)  applied  his 
derivation  to  the  distribution  of  runs  of  length  p  or  more  and 
obtained  identical  conclusions  for  such  runs.   He  gives  tables 
for  exact  numbers  of  arrangements  of  n  num.bers  with  runs  of 
length  p  or  more  and  for  the  fraction  of  arrangement  of  n  num- 
bers with  runs  of  length  p  or  more  based  on  the  Poisson  dis- 
tribution. 
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1^.2     Total  Number  of  Runs  Up  and  Down 

The  total  number  of  runs  is  simply  the  number  of  runs  of 

pluses  or  minuses  of  length  1  or  greater;  and  Moore  and  Wallis 

(191;3)  showed  that  when  n  is  greater  than  2,  it  will  have  an 

2n  -  1  l6n  -  29 

expected  value  of  and  a  variance  of  .   The  total 

3  90 

number  of  runs,  r,  is  asymptotically  normally  distributed,  so 
for  large  values  of  n  the  significance  of  the  total  number  of 
runs  can  be  tested  by  treating  r  as  a  normal  deviate  and  refer- 
ring the  critical  ratio 

2n  -  1 

r  -  — 


l6n  -  29 


'    90 
to  normal  tables.   By  reducing  the  absolute  value  of  the  numer- 
ator by  one-half,  the  critical  ratio  can  be  corrected  for  con- 
tinuity. 

There  are  (r  -  1)  turning  points  of  the  series,  if  r  is  the 
total  number  of  runs.   The  test  based  on  the  total  momber  of 
runs  and  a  test  based  on  the  number  of  turning  points  are  equiv- 
alent.  The  expected  number  of  turning  points,  T,  is  (2n  -  i|.)/3 
and  its  variance  is  the  same  as  that  for  the  total  niimber  of 
runs.   Therefore  the  significance  of  the  number  of  turning 
points  can  be  tested  by  forming  the  critical  ratio  analogous  to 
the  one  given  above,  referring  it  to  normal  tables. 

A.  Stunrt  (195^)  mentions  in  his  paper  that  when  all  tests 
concerned  are  applied  to  samples  from  normally  distributed 
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populaxiona,  the  turning  point  teat  has  an  asymptotic  relative 
efficiency  of  zero  with  respect  to  the  regression  coefficient 
test  and  also  with  respect  to  each  of  the  distribution  free 
tests  of  randomness  with  which  it  was  compared. 

I|..3  Chi-squere  Applied  to  Run  Frequencies 

Wallis  and  Moore  (19I4.I)  suggested  a  chi-square  test  of  sig- 
nificance applied  in  the  usual  way  to  the  observed  frequencies 
of  "interior"  runs  (all  runs  except  the  runs  at  both  ends)  of 
like  signs  of  length  one,  two,  and  over  two,  with  the  corre- 

5(n  -  3)   iKn  -  k) 

spending  expected  frequencies  being  ,  ,  and 

12       60 
kn   -  21 

.   There  are  two  degrees  of  freedom,  one  degree  having 

60 

been  expended  by  obtaining  n  from  the  sample.   The  test,  how- 
ever, is  an  approximate  one  if  the  significance  of  the  calcu- 
lated chi-square  is  obtained  from  the  usual  chi-square  tables. 
This  la  the  case  because  the  run  lengths  are  not  entirely  inde- 
pendent of  one  another,  although  the  chi-square  test  assumes 
that  they  are.   Various  empirically  obtained  "corrections"  are 
offered  by  Wallis  and  Moore  for  use  when  n  exceeds  12.   However, 
for  6  ^  n  ^12,  they  have  provided  a  table  of  exact  probabili- 
ties for  the  values  of  chi-square  as  calculated  from  the  sample. 
These  were  obtained  by  means  of  a  recursion  formula  and  give, 
in  effect,  that  proportion  of  the  ni  permutations  which  yield  a 
value  of  chi-square  as  great  or  greater  than  one  tabled. 

The  test  can  be  used  as  a  test  of  randomness  against  either 


26 


trend  or  correlation  alternatives.   In  the  later  application, 
if  two  measurements  s  and  b  have  been  taken  on  each  of  n  objects, 
the  objects  nre  arranged  in  order  of  increasing  magnitude  of  one 
continuously  distributed  variable  and  the  run  test  is  applied  to 
measurements  on  the  other  variable. 

5.   REMARKS 

Suppose  we  have  a  random  sample  of  m  observations  on  one 
variable  and  a  similar  sample  of  n  observations  on  another  var- 
iable.  Suppose  further  that  nothing  is  known  a  priori  about  the 
distribution  of  each  except  that  both  are  continuous  and  it  is 
desired  to  test  whether  the  two  distributions  are  identical. 
This  problem  is  of  great  importance  and  occurs  frequently.   In 
quality  control  of  manufactured  output  it  may  occur,  for  ex- 
ample, if  we  wish  to  test  whether  the  output  of  two  machines, 
two  workers,  two  different  processes,  or  that  from  raw  material 
obtained  from  two  different  sources  is  the  same.   Naturally, 
the  problems  not  only  of  two,  but  in  general,  of  larger  numbers 
of  samples  may  arise.   Runs  up  and  down  are  widely  used  in 
quality  control  and  have  been  applied  to  economic  time  series. 
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This  report  is  a  summary  of  the  principal  papers  published 
since  l9li-0  on  the  theory  of  runs.   Also  presented  in  this  paper 
are  statistical  tests  derived  from  the  main  distributions  in 
the  theory  of  runs.   All  of  these  papers  are  discussed  in  brief 
except  for  Mood  (19I).0)  which  has  been  discussed  in  detail. 

The  "tivo-sample"  problem  is  examined  by  Mood  (I9i|.0),  using 
runs.   Suppose  there  are  n  elements  of  two  kinds,  say,  n^^,  a's, 
and  n2  =  n  -  n^^,  b's,  and  that  these  are  arranged  at  random  in 
a  row.   If  r^.(i  =  1,  2)  is  the  number  of  runs  of  j  of  elements 
of  variety  i,  the  probability  of  obtaining  a  given  set  of  values 
of  r.  .  is  obtained.   Besides  this  basic  distribution  function, 
there  are  certain  marginal  distributions  such  as  that  for  the 
occurrence  of  a  given  set  of  runs  in  the  a's  regardless  of  how 
the  b's  fall,  or  that  for  r^^  and  rp  if  these  are  respectively 
the  total  ntimber  of  runs  of  a '  s  and  of  b's,  or  that  of  r^^  or 
r2  alone.   Factorial  moments,  mean,  variances,  end  covariances 
are  found.   Similar  results  are  obtained  in  case  there  are 
more  than  two  kinds  of  elements. 

Wald  and  Wolfowitz  used  the  distribution  function  for  the 
total  number  of  runs  (irrespective  of  length)  to  provide  a  test 
of  the  hypothesis  that  two  samples  have  come  from  the  same 
population.   A  teat  which  takes  into  account  the  length  of  the 
runs  is  discussed.   Dixon's  criterion  for  testing  the  hypothesis 
that  two  samples  have  been  drawn  from  populations  with  the  same 
distribution  function  is  also  presented. 

Runs  up  and  down  are  also  considered  in  brief.   A  chi- 
square  test  of  significance  applied  in  the  usual  way  to  the 


observed  frequencies  of  "interior"  runs  of  like  signs  of 
length  one,  two,  and  over  two  as  suggested  by  Wallis  and  Moore 
(191+1)  is  discussed. 


